# install.packages(c("dplyr", "tidyverse", "sf", "leaflet", "plotly"))
library(dplyr) # data wrangling with pipe syntax
library(tidyverse) # data wrangling
library(sf) # simple features geometry data
library(plotly) # interactive plots
library(leaflet) # interactive maps
options(scipen=999, digits = 2) # format output for data tablesGEO 330 Exercise #2
Evaluate Trends in Divvy Bikesharing Trips, 2013-2023
Purpose
This project examines trends in Divvy bike sharing ridership over time (i.e., between 2013 and 2023) and by community area within the city of Chicago. Students will develop custom R scripts to explore how ridership varies across communities with respect to demographic and socioeconomic characteristics. The exercise uses custom summary tables produced from a comprehensive dataset of over 41 million Divvy bike trips logged between the official launch of the program on June 27, 2013 through 2023.
Read the National Association of City Transportation Officials (NACTO) Half a Billion Trips on Shared Micromobility Since 2010 report which summarizes changes in municipal bikeshare systems over the past decade along with emerging trends in micromobility programs.
Also, consider reading Dimensions of Divvy: Exploring the performance of bikesharing in Chicago in a period of growth and expansion. This study, published in 2018, characterizes the three initial phases of Chicago’s Divvy system beginning with its initial rollout in 2013 through its first and second expansions, in 2015 and 2016, respectively, paying special attention to service and performance gaps.
Step 1. Create a new R project
In RStudio, create a project within a new directory (e.g., SUD330/exercise_03) on your computer. The new directory will serve as your project working directory where you will store files related to this exercise. Create the following folders within the new directory–“data”, “images” and “scripts”–to store exercise-specific files.
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. You can download and install R via the R Archive Network web pages.
RStudio is an integrated development environment (IDE) for R. It has a set of tools built to help R users be more productive with R (and other languages including Python)). It includes a console, syntax-highlighting editor that supports direct code execution. It also features tools for plotting, viewing history, debugging and managing your workspace. You can download RStudio via the Posit website.
Step 2. Create a new R script
In the “Files” tab of your RStudio interface (conventionally located on the bottom right panel), navigate to the “scripts” folder in your working directory. Create a new R script in the “scripts” directory and assign it an intuitive name (e.g., “Exercise_02.R”). You will use this script to manage and execute all code for this exercise.
Step 3. Install and activate R packages
Students will use several R packages to complete the exercise, including “dplyr” and “tidyverse” for data formatting and wrangling, “sf” (simple feature) for importing, transforming geographic data, “plotly” for creating interactive figures, and “leaflet” for producing interactive maps. Copy and paste the following code to activate the associated packages in R. It may also be necessary to install the packages prior to activating. If so, uncomment the install.packages command.
Step 4. Download presentation template and process data
As with previous exercises, you are asked to combine and submit all materials into a single presentation. The basic template for this exercise can be downloaded via this link to the PowerPoint slides on the course GitHub.
This exercise makes use of data from a variety of sources. Custom summary tables of Divvy trip origins and destinations by date and community area between 2013 and 2022 were adapted from raw trip information made available via the City of Chicago’s data portal and Divvy bikeshare system data web pages. Frequencies of trips will be mapped by city of Chicago community area boundaries imported directly from the city of Chicago’s data portal. Lastly, variations in trips will be evaluated with respect to neighborhood demographic socioeconomic measures downloaded from the Chicago Health Atlas. Copy and paste the code below to import the necessary data from the course GitHub and the city of Chicago’s data portal.
The Chicago Health Atlas publishes data on hundreds of sociodemographic and public health indicators for communities throughout the city. The website allows for searching and exploring various metrics via maps, charts and tables. Indicator data can also be downloaded by community area, census tract and zip code. Refer to the Atlas’ how to use page for more information.
# import Divvy summary trip data frames from course GitHub
load(url("https://github.com/justenvirons/pedagogy/raw/main/docs/exercises_transportation/data/divvy_trips_summary_tables.RData"))
# import City of Chicago community areas (N=77) GEOJSON from Chicago data portal and transform to geographic projection for mapping (epsg:4326)
community_areas <- st_read("https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON") %>%
st_transform(4326) %>%
mutate(`comarea_id` = as.numeric(area_numbe),
comarea_name = str_to_title(community)) %>%
select(comarea_id, comarea_name)
# import socioeconomic characteristics by community area (adapted from the Chicago Health Atlas) from the course GitHub
ses_attributes <- read_csv("https://github.com/justenvirons/pedagogy/raw/main/docs/exercises_transportation/data/chicago_health_atlas_indicators.csv")
# join community areas with socioeconomic status (SES) attributes to create custom dataset
# add one to rank values to adjust range from 0-76 to 1-77
# create and label quintile (ntile) categories for each SES measure
fx_ntile <- function(x, na.rm=TRUE)(ntile(x,5))
fx_ntile_label <- function(x, na.rm=TRUE)(case_when(x==1 ~ "Lowest",
x==2 ~ "Low",
x==3 ~ "Moderate",
x==4 ~ "High",
x==5 ~ "Highest"))
community_areas_ses <- community_areas %>%
left_join(ses_attributes %>% select(-Name), by="comarea_id") %>%
mutate_at(vars(svi_2020:poverty_2021), list(ntile = fx_ntile)) %>%
mutate_at(vars(svi_2020_ntile:poverty_2021_ntile), list(label = fx_ntile_label)) %>%
select(-ends_with("ntile"))The summary tables used for this exercise was assembled using multiple datasets. Code for creating the comprehensive trip dataset is available on the course GitHub.
Step 5. Explore systemwide trends in Divvy trips over time
Over the past decade the Divvy system has logged over 35 million trips. Figure 1 shows the daily cumulative trips from Divvy’s launch in June 2013 through December 2023.
In slide 1 of your presentation, describe the cumulative growth of Divvy’s systemwide ridership over the past decade. On what date did the system surpass 10 million trips? How long did it take before Divvy logged the next 10 million trips? How do these overall trends in Chicago’s bikeshare system compare to systems nationwide? (Refer to the figure on page 6 of NACTO’s micromobility report.)
Now examine monthly patterns of Divvy ridership over the same period (Figure 2). Hover over each bar to see how ridership differs from month to month. In slide 2 of your presentation, note any patterns and/or variations you see in monthly ridership. What factors do you feel explain any differences in ridership by month? What other non-seasonal factors do you feel may have influenced ridership? Support your findings by referencing statistics from the figure.
Lastly, let’s examine daily patterns of Divvy ridership over the past decade (Figure 3). In slide 3 of your presentation, note any patterns and/or variations you see in daily ridership. What factors do you feel explain any daily differences in ridership? Again, support your findings by referencing statistics from the figure.
Step 6. Explore total and average trips by community area
In this step we examine variations in Divvy trip patterns across the city by community area (N=77). Figure 4 below shows the distribution of total Divvy trips between 2013 and 2023 across the city. Hover over the community area boundaries to observe their respective total (cumulative) and average daily riderships over the past decade. Daily ridership per 10,000 residents (i.e., ridership standardized by the community area population) is also indicated.
In slide 4 of your presentation, note any patterns and/or variations you see in total ridership by community area. What factors do you feel explain differences in total and daily ridership? Support your findings by referencing statistics from the figure.
Step 7. Explore trends by socioeconomic status
In this step we examine trends in Divvy usage across communities with different socioeconomic characteristics. For this step you will need to modify the below R code to create a custom map, plot and table to inform your analysis. An example analysis is summarized here.
Select and map socioeconomic indicator
First, select one of the five socioeconomic measures downloaded from the Chicago Health Atlas to examine Divvy ridership, namely: (1) social vulnerability index (svi_2020); (2) neighborhood safety (safety_2018); (3) median household income (hhinc_2021); (4) hardship index (hardship_2021); or (5) household poverty (poverty_2021). (Follow the indicator links above for a more thorough description of each indicator via the health atlas.)
After selecting one of the indicators, copy and paste the following code into your R script and follow the instructions in the video to create a custom map. Copy the map to slide 5 of your presentation and describe any visual relationship you see (or don’t see) between ridership and the socioeconomic indicator you selected.
# create total trips (over study period) by community area dataset
trips_by_communityarea <- trips_by_yearmonth_ca %>%
drop_na(yearmonth) %>%
filter(yearmonth == max(yearmonth))
# join trips to community areas with ses attributes
trips_by_communityarea_ses <- community_areas_ses %>%
left_join(trips_by_communityarea, by="comarea_id") %>%
select(-starts_with("yearmonth"))
# return total days included in dataset
total_days <- as.integer(count(trips_by_date))
# Edit the following three lines with selected indicator name/title, value and ntile variables
ses_name <- "Poverty Rate" # EDIT HERE
trips_by_communityarea_ses$ses_value <- trips_by_communityarea_ses$poverty_2021 # EDIT HERE
trips_by_communityarea_ses$ses_ntile <- trips_by_communityarea_ses$poverty_2021_ntile_label # EDIT HERE
community_areas_ses$ses_ntile <- community_areas_ses$poverty_2021_ntile_label # EDIT HERE
# create interactive map using leaflet
# color palette
map_pal <- colorQuantile(palette = "RdYlBu",
domain = trips_by_communityarea_ses$ses_value,
n=5,
reverse=TRUE)
# hover labels
communityarea_labels <- sprintf(
paste0("<strong>%s</strong><br/>
<i>Trips (from):</i> %s<br/>
<i>Trips (to):</i> %s<br/>
<i>Avg daily trips (from):</i> %0.1f<br/>
<i>Avg daily trips (to):</i> %0.1f<br/>
<i>Avg daily trips (from) per 10K pop:</i> %0.1f<br/>
<i>Avg daily trips (to) per 10K pop:</i> %0.1f<br/><i>",
ses_name,":</i> %0.1f<br/>"),
trips_by_communityarea_ses$comarea_name,
prettyNum(trips_by_communityarea_ses$fr_trips_cumul, big.mark=","),
prettyNum(trips_by_communityarea_ses$to_trips_cumul, big.mark=","),
trips_by_communityarea_ses$fr_trips_cumul/total_days,
trips_by_communityarea_ses$to_trips_cumul/total_days,
trips_by_communityarea_ses$fr_trips_cumul/total_days/trips_by_communityarea_ses$pop_2021*1000,
trips_by_communityarea_ses$to_trips_cumul/total_days/trips_by_communityarea_ses$pop_2021*1000,
trips_by_communityarea_ses$ses_value) %>%
lapply(htmltools::HTML)
# interactive map
leaflet(data = trips_by_communityarea_ses) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~map_pal(ses_value),
label = communityarea_labels,
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7) %>%
addLegend("bottomleft",
pal = map_pal,
values = ~ses_value, # EDIT HERE
title = paste0(ses_name," Quintiles"),
opacity = 1)Examine Divvy ridership trends by socioeconomic indicator over time
Now you will evaluate Divvy ridership over the past decade for communities with different socioeconomic characteristics.
Using the same indicator you selected above, copy and paste the following two code chunks into your R script and follow the instructions in the video to create a custom table and plot. Copy both the plot into slide 6 of your presentation and describe any relationships you see (or don’t see) between ridership and the socioeconomic indicator you selected over the past ten years. Have variations in ridership changed differently over time with respect to community socioeconomic condition?
trips_by_yearmonth_by_ses <- trips_by_yearmonth_ca %>%
left_join(community_areas_ses %>% st_drop_geometry(), by="comarea_id") %>%
group_by(yearmonth, ses_ntile) %>% #
summarise(tot_pop = sum(pop_2021),
fr_trips = sum(fr_trips),
to_trips = sum(to_trips),
fr_trips_cumul = sum(fr_trips_cumul),
to_trips_cumul = sum(to_trips_cumul),
fr_dailytrips = fr_trips/total_days,
to_dailytrips = to_trips/total_days,
fr_trips_per1k = fr_trips/tot_pop*1000,
to_trips_per1k = to_trips/tot_pop*1000) %>%
drop_na(tot_pop) %>%
mutate(order = case_when(ses_ntile == "Lowest" ~ 1,
ses_ntile == "Low" ~ 2,
ses_ntile == "Moderate" ~ 3,
ses_ntile == "High" ~ 4,
ses_ntile == "Highest" ~ 5),
ses_label = factor(ses_ntile, levels = c("Highest",
"High",
"Moderate",
"Low",
"Lowest"))) %>%
arrange(yearmonth, order)
# create a color palette
pal_quintile_colors <- c("darkred", "orange", "yellow", "lightblue","darkblue")
pal_quintile <- colorFactor(palette = pal_quintile_colors, domain = levels(trips_by_yearmonth_by_ses$ses_label), ordered = T)
plot_ly(data = trips_by_yearmonth_by_ses,
x = ~yearmonth,
y = ~fr_trips_per1k,
type = 'scatter',
mode = 'marker+line',
color = ~ses_label,
colors = pal_quintile_colors,
text = ~ses_label,
hovertemplate = '%{text}: %{y:0.1f}<extra></extra>') %>%
config(displaylogo = FALSE) %>%
layout(
xaxis = list(title = ""), # EDIT HERE
yaxis = list(title = "Divvy Trips per 1,000 Residents"),
legend=list(
font = list(size = 10),
xanchor="center",
orientation = "h",
x = 0.5, y=-0.1),
hovermode = "x unified")Summary of ridership by socioeconomic indicator
Here you create a final custom table with your summary information by socioeconomic category. The code saves the output table to your project data directory for further analysis. Copy and paste the table into slide 6 of your presentation (along with the above plot) and briefly summarize your findings.
trips_by_ses_quintile <- trips_by_yearmonth_by_ses %>%
group_by(ses_ntile) %>%
summarise(order = max(order),
tot_pop = max(tot_pop),
fr_trips = sum(fr_trips),
to_trips = sum(to_trips)) %>%
mutate(tot_pop_pct = tot_pop/sum(tot_pop),
fr_trips_pct = fr_trips/sum(fr_trips),
to_trips_pct = to_trips/sum(to_trips))
write_csv(trips_by_ses_quintile,"data/trips_by_ses_quintile.csv")